We have species composition data from 1031 1m^2 quadrats representing 27 sites.
Nearly all statistical packages require the data to be in a presence-absence form. There are several ways to do it (one of which can maintain cover values rather than changing it to binary data); I used a loop function. The result is a presence-absence matrix with Site as a column so subsamples can be organized accordingly. This column must be deleted for every analysis, however!
| Site | SCICYP | PROPEC | JUNDEB | JUNREP | CXJOOR | CXLOUI | |
|---|---|---|---|---|---|---|---|
| 1.1.1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
| 1.1.2 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
| 1.1.3 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| 1.1.4 | 1 | 1 | 1 | 0 | 0 | 1 | 1 |
| 1.1.5 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
Sites had widely varying observed total richness. The use of extrapolated species accumulation curves can tell us how many species are likely to be at the site based on how many were found in accumulating subsamples. However if a curve never flattens, it gives a wild estimate of richness (see Site 10).
What do the curves that made these estimates look like? Let’s take a look!
This plot is not particularly helpful other than to visualize the general span of observed and expected richnesses and sampling efforts. Examining the curves in portions of 3-4 is necessary to
These curves illustrate not only where the flattening point (expected richness) occurs, but also how quickly. Examining a curve can allow someone to estimate how many more samples would be needed to reach that point, however if doing so samples a larger area then the curve may never flatten.
Let’s see if sampling effort (# quadrats/area sampled) affected percent estimated sampling completion; if it did, that would be a big problem and I would have a lot of explaining to do to my committee.
There is no relationship between sampling effort and completion percentage (p=0.6366006). However, note that Sites 10 and 26 were flagged as outliers by the autoplot function. This inadequate sampling is likely the result of too few quadrats sampled at the wetland edge relative to the size of the wetland.
There appears to be a relationship between site richness and site area, but unexpectedly this relationship appears to be negative. Because the data are likely non-linear, a generalized linear model should be used to assess this relationship.
There may be a lot of overlap in clusters due to the nestedness of some community types. We should revisit this later using quadrats as replicates.
ggplot(distance.f, aes(x=distance.f$geographic, y=distance.f$Sorensen)) +
geom_point()+
geom_smooth(method="glm", method.args=list(family="binomial"(link="logit")))+
ylab("Sorensen dissimilarity index")+
xlab("Geographic distance (m)")+
ggtitle("Geographic distance between sites has no effect on species similarity")+
coord_cartesian(ylim = c(0, 1))non-integer #successes in a binomial glm!
Call:
glm(formula = Sorensen ~ geographic, family = binomial, data = distance.f)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.13785 -0.13608 0.02589 0.19654 0.64852
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.427e+00 1.626e-01 8.777 <2e-16 ***
geographic 6.933e-07 1.764e-06 0.393 0.694
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.121 on 701 degrees of freedom
Residual deviance: 42.966 on 700 degrees of freedom
AIC: 298.02
Number of Fisher Scoring iterations: 4
Ordinations
Plant communities aren’t real. We can all go home now.